Goto

Collaborating Authors

 parallel knowledge gradient method


e8f2779682fd11fa2067beffc27a9192-Supplemental.pdf

Neural Information Processing Systems

In this analysis, we assume that evaluating the GP prior mean and kernel functions (and the corresponding derivatives) takesO(1)time. For each fantasy model, we need to compute the posterior mean and covariance matrix for the L points (x,w1:L), on which we draw the sample paths. This results in a total cost ofO(KML2)to generate all samples. The SAA approach trades a stochastic optimization problem with a deterministic approximation, which can be efficiently optimized. Suppose that we are interested in the optimization problemminxEω[h(x,ω)].


The Parallel Knowledge Gradient Method for Batch Bayesian Optimization

Neural Information Processing Systems

In many applications of black-box optimization, one can evaluate multiple points simultaneously, e.g. when evaluating the performances of several different neural network architectures in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm --- the parallel knowledge gradient method. By construction, this method provides the one-step Bayes optimal batch of points to sample. We provide an efficient strategy for computing this Bayes-optimal batch of points, and we demonstrate that the parallel knowledge gradient method finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms, especially when function evaluations are noisy.



Reviews: The Parallel Knowledge Gradient Method for Batch Bayesian Optimization

Neural Information Processing Systems

The paper is well written and easy to follow. Parallelization of BO is an important subject for practical hyperparameter optimization and the proposed approach is interesting and more elegant than most existing approaches I am aware of. The fact a Bayes-optimal batch is determined is very promising. The authors assume independent normally distributed errors, which is common in most BO methods based on Gaussian processes. However, in hyperparameter optimization this assumption is problematic, since measurements errors represent the difference between generalization performance and empirical estimates (e.g., through cross-validation).


The Parallel Knowledge Gradient Method for Batch Bayesian Optimization

Wu, Jian, Frazier, Peter

Neural Information Processing Systems

In many applications of black-box optimization, one can evaluate multiple points simultaneously, e.g. when evaluating the performances of several different neural network architectures in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm --- the parallel knowledge gradient method. By construction, this method provides the one-step Bayes optimal batch of points to sample. We provide an efficient strategy for computing this Bayes-optimal batch of points, and we demonstrate that the parallel knowledge gradient method finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms, especially when function evaluations are noisy. Papers published at the Neural Information Processing Systems Conference.


The Parallel Knowledge Gradient Method for Batch Bayesian Optimization

Wu, Jian, Frazier, Peter

Neural Information Processing Systems

In many applications of black-box optimization, one can evaluate multiple points simultaneously, e.g. when evaluating the performances of several different neural network architectures in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm --- the parallel knowledge gradient method. By construction, this method provides the one-step Bayes optimal batch of points to sample. We provide an efficient strategy for computing this Bayes-optimal batch of points, and we demonstrate that the parallel knowledge gradient method finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms, especially when function evaluations are noisy.